Superstore customer sales analysis

The Superstore is very large supermarket, often selling household goods, clothes, and electrical goods.

A superstore with operations across the City of US(United states) aims to understand some features and also tries to get vital information from their data. They also focus on their selling item,customers and also predict 3 month customer lifetime value.

Import libraries

Data Import

Data Exploration

Data Preprocessing

  1. Remove Null Values
  2. Remove Duplicate Values

Data Insights

Top 10 country customer data

Number of Sales per Category

Time frame of data

Perform RFM Analysis

RFM (recency, frequency, monetary) analysis is a behavior based technique used to segment customers by examining their transaction history such as

Create Visuals

Normalize the Data

Calculate the RFM Score

Outliers

Performing cluster analysis using K-means clustering with the original rfm dataframe

In cluster analysis, the Elbow method is a heuristic used in determining the number of clusters in a data set.

WCSS is the sum of squared distance between each point and the centroid in a cluster. When we plot the WCSS with the K value, the plot looks like an Elbow. As the number of clusters increases, the WCSS value will start to decrease. WCSS value is largest when K = 1

Plot the graphs

Frequency vs Monetory

In this plot the blue cluster customers are more scattered, these customers has the high frequency value which means these are the customers who spend more and shop more frequently whereas green cluster customers are the ones who spend more though the frequency is less compared to the orange cluster customers. Orange cluster customers are the ones who spend and shop less frequently compared to the other two clusters.

Recency vs Monetory

In this plot blue cluster customers are who's recency is above 180 yet monetory value is low with value between 2500-5500 that means cutomer buy recently by there purchase of amount is not too high. Most of green cluster customers recency is around 200 but they are more scattered and monetory value is about 2500 . Orange cluster customers monetory value is more (between 5500-10000) compared to the other two cluster groups but the recency value is less compared to other groups most of the customers recency value is between 0 - 100.

Recency vs Frequency

In this plot the recency value of blue cluster customer ranges from 0-500 and frequency is 0-35 which means customer buy recently and frequently. Green cluster customer recency ranges from 0-800 and frequency is 0-25 and above. Orange cluster customer recency ranges from 10-35 and frequency ranges from 0- 150.

Perform Cluster Analysis with k=4

Performing cluster analysis with the outliers removed rfm dataframe and Following exact same steps including the elbow method and then plotting the graphs

Frequency vs Monetory

In this plot the blue cluster customers are more scattered, these customers has the high frequency value which means these are the customers who spend more and shop more frequently whereas green cluster customers are the ones who spend more though the frequency is less compared to the orange cluster customers. Orange cluster customers are the ones who spend and shop less frequently compared to the other two clusters.

Recency vs Monetory

In this plot blue cluster customers are who's recency is above 180 yet monetory value is low with value between 2500-5500 that means cutomer buy recently by there purchase of amount is not too high. Most of green cluster customers recency is around 200 but they are more scattered and monetory value is about 2500 . Orange cluster customers monetory value is more (between 5500-10000) compared to the other two cluster groups but the recency value is less compared to other groups most of the customers recency value is between 0 - 100.

Recency vs Frequency

In this plot the recency value of blue cluster customer ranges from 0-500 and frequency is 0-35 which means customer buy recently and frequently. Green cluster customer recency ranges from 0-800 and frequency is 0-25 and above. Orange cluster customer recency ranges from 10-35 and frequency ranges from 0- 150.

Calculating the mean for every cluster

Performing cluster analysis using hierarchical clustering with the cleaned rfm dataframe

A dendrogram is a type of tree diagram showing hierarchical clustering - relationships between similar sets of data.

Identify the clusters based on dendrogram

Creating all the three plots again and observing if there is any differences from k-means clustering method

The difference between k fold and above plots is that there are only two clusters created, whereas in K folds there might be 'k' number of clusters.

Evaluate Clustering

Time Series Trends

Repeat Customers

Here we are going to look at the number of monthly repeat purchases. This means a customer have placed more than one order within given month.

Choropleth map

It is difficult to plot a map since the state abbreviation or latitude and longitude are not provided. As a result, the state abbreviations are appended to the appropriate states, and a choropleth map is created.

By Sales

Calculate Frequency, Recency, and Total Amount of purchases by each customer

Predict 3 Month Customer Lifetime Value (CLV)

Linear Regression

Evaluate the Regression Model performance